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1. INTRODUCTION 

Suicide-related behaviours (SRBs) need to be prevented on psychiatric patients. SRBs includes 
suicide attempt or instrumental SRBs [1]. Suicide is the act of intentionally causing one's own death [2]. Risk 
factors include mental disorders such as depression, bipolar disorder, schizophrenia, personality disorders, 
alcoholism, or substance misuse [3], [4]. People have SRBs that do not result in death are at high risk for 
future self-injury and completed suicide [5], [6]. 

Prediction of those SRBs based on patient medical records would be very useful for the prevention 
by the psychiatric hospital. This research focused on developing this prediction at the only one psychiatric 
hospital of Bali Province by using Smooth Support Vector Machine (SSVM) method, as the further 
development of Support Vector Machine (SVM) [7]-[9]. According to [10], SVM utilizes quadratic 
programming optimization so that it is less efficient for high-dimensional and large data. Because of that, a 
developed smoothing technique is used to replace plus function of SVM by using integral of neural network 
sigmoid function. This smoothing technique is known as SSVM. When compared with SVM, SSVM has 
better running time and accuracy. The SSVM generated and solve an unconstrained smooth reformulation of 
the SVM for pattern classification using completely arbitrary kernel [8]. SSVM is solved by a Newton- 
Armijo algorithm and has been extended to nonlinear separation surfaces by using nonlinear kernel 
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techniques. The numerical results show that SSVM is faster than other methods and has better generalization 
ability [7]. 


2. SMOOTH SUPPORT VECTOR MACHINE 

As a base of SSVM, SVM [11] is a method to find optimal hyperplane that separates two classes of 
input space. Separation of more than two classes have conducted previously by authors on fingerprint data 
[12]-[15]. Figure 1 shows several alternative hyperplanes (discrimination boundaries) and the best 
hyperplane of a data set consists of two classes, i.e. class {-1} and {+1}. The best hyperplane is the 
hyperplane which has a maximum margin obtained from alternative dividing lines (discriminant boundaries). 
Margin is the distance between the hyperplane to the nearest point of each class. This nearest point is so- 
called support vector [16]. 
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Figure 1. Several alternative hyperplanes (left), the best hyperplane (right) 


Classification problem of m points in n-dimensional space (R") is represented as mxn -sized matrix 
A. The matrix element A," to the class {-1} and {+1} is defined on mxm -sized diagonal matrix D with —1 
and +1 at its diagonal. Linear SVM algorithm is shown by (1), with constrains D(Aw — ey) + y > e and y= 0; 
a positive value SVM parameter v; mx1 -sized slack variable vector y that measures classification error and 
has non-negative value; m-sized column vector e and has value of 1; nx1 -sized normal vector w; and bias 
value y that determine hyperplane relative location to the original class. 


min 


Toast 
(wp yj eR” Ye YEW W s 


The constrains equation above compares each vector element. When two classes can be separated 
perfectly by the defined hyperplane x'w + y = 0, there are two parallel hyperplane which are boundaries of 
those two classes, i.e. x'w+y=-l of the class {-1} and x'w+y=+1 of the class {+1}. A non-linear 
hyperplane is obtained by transforming the standard SVM formulation (2), and by using "kernel trick" 
through a Gaussian kernel function (3), where u is a kernel parameter and i, j = 1, 2, ..., m. 


w=A'Du (2) 
K(x;, x;) = exp(—y]lx, 7 xl’), H >0 (3) 


By using (2) into (1), non-linear problem functions is obtained, as shown by (4), with constrains 
D(AA'Du — ey) + y > eand y > 0. 


min 


(w, y, y) e R1 VETY + 5u'DAATDu (4) 


The solution for the functions (4) is K(x’, A')Du =y. By replacing A'A with non-linear kernel 
K(A, A7) and variable y is minimized by weighting n generalized non-linear SVM is shown by (5) with 
constrains D(K(A, A')Du - ey) + y > e and y > 0. 
min Wes T in T 2 
-yly+- + 
(w, yy) € Rin sv Y z(u uty) (5) 
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To solve (5), constraint function is defined by (6). 
y =(e—D(K(A, A’)Du - ey). (6) 


By replacing (6) into (5), SVM problem equation is obtained which is equivalent to unconstrained 
SVM optimization, as shown by (7). 


min v 2 1 
(u, 9) ERM zle- DKA, A)Du — ey), + 30u +7) 


(7) 


At (.)4, negative components are replaced by zeros. Equation (7) has a unique solution but its 
objective function is not twice differentiable which precludes the use of a fast Newton method. For that, a 
smoothing technique was proposed [10] that replaces plus function (.), by using integral of sigmoid function 


-1 
(1 + exp(-ax)) of neural network. Equation (8) shows the SSVM where a is the smoothing parameter. 


min 


min et O,(w, y=, TP m Tlp- DECA, A*)Du — ey), a) [+ (ufu +?) 
(u,y)ER (uyy)ER™ 2 2 2 
LAT d TE 


(8) 


Equation (8) can be optimized by using numerical approach through the Newton-Armijo method. 
The first step is to initiate (Ww, y) € R”*' where w® indicates ith iteration of w. The second step is to 
repeat the iteration until the gradient of the objective function at (8) is equal to zero or VO,(w®, y®) = 0. The 
third step is to calculate (w“*)), y*) = 0 as follows: 
a. Newton Direction: determine the direction of d ® € Rr, as shown by (9). 


TOWO, 1) dO =—VO Cw, 0)" (9) 


b. Armijo Stepsize: choose the stepsize A; € R, such that 


(WED, YD) = (w, 1) +4, dO (10) 
where 4; = max Üh such that 
Oy (WO, VO) — Oy((WO, O) + 2 dO) = VO (Ww, YO) dO (11) 


where 6 = (0, =) 


When V®,(w®, y) = 0, the Newton-Armijo algorithm iteration stopped and convergent value of w 
and y were obtained for hyperplane function, as shown by (12). 


F(x) = sign (x'w- y) (12) 


3. RESEARCH METHOD 

For SRBs prediction using SSVM, five stages of research method were conducted, i.e.: 1) data 
preparation; 2) data transformation; 3) training and testing data selection; 4) SSVM model development; and 
5) SSVM model evaluation. 

Data preparation is related to the data collection from electronic and non-electronic medical record 
of the only one psychiatric hospital in Bali Province. There are 30.660 inpatient and outpatient medical 
record from the last five years up to April 2016. Data were collected through database query on the hospital 
information system and then they were exported to CSV format. Data cleaning gave 2665 relevant data for 
this research, includes 111 patients that have SRBs and under active treatment. Removed data have one or 
more than one of this three condition, i.e. 1) not psychiatric-disorder patient data (drug-free or psychiatric- 
disorder-free certificate applicant, dental patient, drug patient, neurology patient, or physiotherapy patient); 
2) incomplete data (manual data that was migrated into information system and related patient has not been 
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inpatient or outpatient since data migration time); and 3) inactive patient data (pass away patient or not an 
outpatient). 

Data transformation is related to the transformation of previous data into predictor variables and 
response variable. Ten predictor variables were obtained from database query above, i.e. disease diagnosis 
(xı), profession (x2), education (x3), payment type / health insurance type (x4), domicile (x5), age (x6), age 
range (x7), sex (xg), marital status (x9), and family history (xj9). A response variable (y) is a variable with 
value —1 and +1 that represents class of patients that have no SRBs and class of patients that have SRBs, 
respectively. Response variable data was obtained from non-electronic medical record related to data of 
suicide attempt or instrumental SRBs [1]. Figure 2 shows research sample of predictor (instance) and 
response (label) matrix by using SSVM toolbox library [17]. Row and column of instance matrix represent a 
patient data and his/her related ten predictor variables, respectively. 


Instance label 
Pı — 1441.3 40 22 2 5: a 12 Ri—+] 1 
6835 840 21 4 2 a 

5 42 
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Figure 2. Predictor and response matrix 


Training and testing data selection is related to the next stage of SSVM model development and 
evaluation. Two data selection mechanisms were used to get the best hyperplane in Figure 1, i.e. by using: 

a. Ten-folds cross validation (10-fcv) selection [18] and data ratio selection of 2665 relevant data. k-fold 
cross validation splits the data into k sections at random. Each section has the same class proportion to the 
initial class proportion. Each section will be used as training data and the rest is used as testing data, so 
there will be k accuracy. Final accuracy is the average of those k accuracy. On data ratio, data of patients 
that have SRBs and patients that have no SRBs were randomly selected in certain ratio for training and 
testing data, respectively. For an example, ratio 90:10 means 90% of relevant data as training data and 
10% of relevant data as testing data. Training data consist of 90% of data of patients that have SRBs and 
90% of data of patients that have no SRBs, while testing data consist of 10% of data of patients that have 
SRBs and 10% of data of patients that have no SRBs; 

b. Data ratio selection based on 111 data of patients that have SRBs from 2665 relevant data. For an 
example, data ratio 1:2 means training data (also used as testing data) consist of data of patients that have 
SRBs and 222 data of patients that have no SRBs. Several best results of obtained SSVM models than 
were tested by using randomly selected data in number of 10%, 20% ... and 100% of 2665 relevant data. 

SSVM model development is related to parameter w and y (12) that was computed by using SSVM 
toolbox library [17]. Several parameter w and y was computed based on various training and testing data 
selection above. SSVM model evaluation is related to the classification accuracy that can be determined by 
using contingency table [19], as shown by Table 1. Based on that table, the classification accuracy can be 

measured by using (13). 


Table 1. Classification accuracy contingency 


Actual Prediction 
I (Negative) II (Positive) 
Negative True Negative (TN) False Positive (FP) 
Positive False Negative (FN) True Positive (TP) 
TN + TP 


accuracy (%) = (13) 


TP + FP + FN + TN 

where TN is the number of prediction of patients that have no SRBs and in fact that patients have no 
such behaviours; TP is the number of prediction of patients that have SRBs and in fact that patients have such 
behaviours; FP is the number of prediction of patients that have SRBs and in fact that patients have no such 
behaviours; and FN is the number of prediction of patients that have no SRBs and in fact that patients have 
such behaviours. 
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4. RESULTS AND ANALYSIS 
Experiment was based on previous two selection mechanisms of training and testing data, running 
on Intel Core™ i5-4460T CPU @1.90GHz with 4GB RAM and Windows 10 64-bit Operating System. 


4.1. Data selection mechanism 1 

Table 2 shows high accuracy of SSVM model on every data selection type but all of their TP were 
zero that make all of those model cannot be used for prediction of patients that have SRBs. High accuracy 
came from high number of TN (13) since many patients that have no SRBs are on the data in this research. 


4.2. Data selection mechanism 2 

Table 3 shows non-zero TPs by six SSVM models that were obtained by using six data ratio 
selections from 1:05 up to 1:1. Each of those SSVM models then were tested by using ten portions of 2665 
data. So, each SSVM model will give ten accuracy result where its average accuracy is shown by Figure 3. 


Table 2. SSVM Model Performance by using Data Table 3. SSVM Model Performance by using Data 


Selection Mechanism 1 Selection Mechanism 2 
Selection | Testing Data] w y | TN|EN|FP|TP| Acuracy|Time (s) Selection | Testing Data w z TN | FN | FP | TP | Acuracy | Time (s) 
10-fev | variable | 0.01 | 0.0017 [2554| 111| 0 | 0 | 0.9583 | 907.954 1:05 16r DOD 788E 83 (48106/106826; -1312 
90:10 266 0.01 | 0.0018 | 255 | 11 | O | O | 0.9586 | 722.597 10S He OOL | SSIES 10" [O89 71108 [10.651 Ett 
1:0.7 189 1.78E+03 | 5.79E-04] 77 | 52] 1 | 59} 0.7196 1.627 
80:20 533 0.01 | 0.0019 | 511 | 22 | O | O | 0.9587 | 555.397 PA Sag i senses [sonal ge P 5) 38 on cae 
70:30 799 0.01 | 0.0016 | 766 | 33 | O | 0 | 0.9587 | 361.729 1:0.9 211 1.00E+04|7.78E-05| 96 |49| 4 |62| 0.7488 2.634 
60:40 1065 0.01 | 0.0023 | 1021] 22 | 0 | 0 | 0.9587 | 250.639 1:1 222 1.7783 |2.67E-05} 101 | 57 | 10| 54 | 0.6982 2.094 
50:50 1282 0.01 | 0.0018 | 1227] 55 | O | O | 0.9571 | 165.321 1:2 333 0.0562 | 0.0323 | 222 |111| 0 | 0 | 0.6667 | 4.155 
1:33 +H 0.0562 | 0.0268 | 333 |111| 0] 0 0.75 10.11 
1:4 555 0.0562 | 0.0245 | 444 |111| 0] 0 0.8 18.337 
1:5 666 0.0562 | 0.0216 | 555 |111| O | O | 0.8333 | 32.273 
1:6 717 0.01 0.002 | 666 |111| O| O | 0.8571 | 40.845 
1:7 888 0.01 0.003 | 777]111] 0] O 0.875 69.532 
1:8 999 0.01 0.0022 | 888 |111| O | O | 0.8889 | 80.796 
1:9 1110 0.01 0.0021 | 999 |111| 0| 0 0.9 127.757 
1:10 1221 0.01 0.0022 |1110|111| O | O | 0.9091 | 155.998 
1:11 1332 0.01 0.0021 |1221|111| O | O | 0.9167 | 197.339 
1:12 1443 0.01 0.0028 |1332|111| O | O | 0.9231 | 233.046 
1:13 1554 0.01 0.0015 |1443|111| O | O | 0.9286 | 224.112 
1:14 1665 0.01 0.0021 |1554|111| O | O | 0.9333 | 278.644 
1:15 1776 0.01 0.0022 |1665|111| O | O | 0.9375 | 327.802 
1:16 1887 0.01 0.0023 |1776] 111] 0 | O | 0.9412 | 396.622 
1:17 1998 0.01 0.0021 | 1887] 111] O | O | 0.9444 | 465.335 
1:18 2109 0.01 0.0022 |1998|111| 0 | O | 0.9474 | 560.466 
1:19 2220 0.01 0.0017 |2109) 111] 0| 0 0.95 576.861 
1:20 2331 0.01 0.0023 |2220|111| O | O | 0.9524 | 637.395 
1:21 2442 0.01 0.0024 |2331|111| 0 | O | 0.9545 | 768.092 
1:22 2553 0.01 0.0023 |2442]111] O | O | 0.9565 | 860.427 
1:23 2665 0.01 0.0017 2554] 111] 0 | O | 0.9583 | 907.954 


Based on Figure 3, SSVM model generated by data ratio 1:1 (training data consist of 111 data of 
patients that have SRBs and 111 data of patients that have no SRBs) gave the best average accuracy at about 
63%. On running time, most of the time was used for SSVM model development related to parameter w and 
y (12). No significant increase on running time for different portion of relevant data, as shown by Table 4. 
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Figure 3. Six SSVM models performance by using six data ratio selections 
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Table 4. SSVM Model Performance by using Data Ratio 1:1 


Data Portion | Testing Data) TN |FN| FP |TP | Acuracy | Time (s) 
10% 266 170| 4 | 85 | 7 | 0.6654 | 2.042 
20% 533 394 | 12 | 117 | 10 | 0.75797 | 2.044 
30% 799 672 | 19| 94 |14| 0.8586 | 2.042 
40% 1065 531 | 27 | 490 | 17 | 0.51455 | 2.202 
50% 1383 860 | 28 | 467 | 28 | 0.6421 | 2.042 
60% 1600 959 | 30 | 574 | 37 | 0.6225 | 2.034 
70% 1866 818 | 38 | 970 | 40 | 0.4598 2.02 
80% 2243 1207| 45 | 947 | 44| 0.5577 | 2.038 
90% 2399 1320| 53 | 979 | 47 | 0.5698 | 2.064 
100% 2665 1490| 57 | 1064| 54 | 0.57936 | 2.058 


Results above were based on ten predictor variables, as described previously. Table 5 gave the result 
about the influence of each of those variables to the SSVM model performance by using data ratio 1:1 and 
testing data at 30% of 2665 relevant data as shown in Table 4. 


Table 5. SSVM Model Performance on Reduced Predictor Variables 
No| Reduced Predictor Variables | TN |FN Acuracy 
1 | X1,X2,X3,X4,X5,X6,X7,X8,X9 |629| 18 | 137| 15 | 0.80601 
X1, X2, X3, X4, X5, X6, X7, X8, X10 | 629} 18 | 137| 15 | 0.80601 
X1, X2,X3, X4, X5, X6, X7, X9,X10 |629| 18 | 137| 15 | 0.80601 
X1, X2, X3, X4, X5, X6, X8, X9, X10 |629| 18 | 137| 15 | 0.80601 
X1,X2,X3,X4,X5,X7,X8,X9,X 19 |619| 18 | 147| 15 | 0.79349 


X1, X2, X3, X4, X6, X7, X8, X9, X10 | 667] 30 | 99 | 3 | 0.83855 
X1,X2,X3,X5,X6,X7,X8,X9,X10 |759| 32| 7 | 1 | 0.95129 


X1, X2,X4, X5, X6, X7, X8, X9, X10 | 722| 30 | 44 0.9074 


3 
X1, X3, X4, X5, X6, X7, X8, X9, X10 | 737| 33 | 29 | 0 | 0.9224 


10 | X2, X3, X4, X5, X6,X7,X8, X9, X10 |716| 32| 50| 1 | 0.8974 


Table 5 shows that six predictor variables, i.e. disease diagnosis (xı), profession (x2), education (x3), 
payment type/health insurance type (x4), domicile (x5), and age (x6), have much influence to the SSVM model 
result because of the decreasing value of its TP and/or the increasing value of its FP without each of those 
variables. Hypothetically, age range (x7), sex (xg), marital status (x9), and family history (x19) have influence 
on the prediction. Age range between 19—45 years old (that was used as reference by the psychiatric hospital) 
apparently has relatively small influence in this SSVM model, neither do sex, marital status, nor family 
history, even though female or unmarried status or social network was considered to be the risk factors of 
SRBs [1]. 


5. CONCLUSION AND FUTURE WORK 

Suicide-related behaviours (SRBs) prediction with SSVM gave the best average accuracy at 63%. 
This accuracy can be obtained by using 30% of 2665 relevant data as data testing and by using training data 
which have one-to-one ratio in number between patients that have SRBs and patients that have no SRBs. In 
the future work, accuracy improvement need to be confirmed by using Reduced Support Vector Machine 
(RSVM) method, as the further development of SSVM [10]. 
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